Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: Larry M Date: Feb 3, 2004 2:12pm
Forum: millionbooks Subject: JBIG2 compression

I posted in 'text archive' forum, but the topic is really about the 'million books' project.

I took one of the books (Early Jazz) and tried converting it to PDF using Acrobat 6 with a variety of settings. Using JBIG2 "lossy" compression on the TIF file, it turned the 88 MB .tif file into a 7.34 MB .pdf, which you can browse at:

http://larry.masinter.net/earlyjazz-notext.pdf

If you OCR the pages so there is searchable text as well as images in the same file, you get 16.7MB:

http://larry.masinter.net/earlyjazz.pdf

This compares to 33.9MB for DjVu.

I think you might need the Adobe Reader 6 to open these files:
http://www.adobe.com/products/acrobat/readstep2.html

The "reduce file size" using JBIG2 cleans up
text images a bit, since it reduces noise.

Anyway, I'm interested in helping (with software, etc.) if someone wants to pursue this for 'million books'; let me know.

Larry
--
http://larry.masinter.net